Skip to content

Conversation

@jiqing-feng
Copy link
Owner

@jiqing-feng jiqing-feng commented Nov 20, 2025

Introduce BRGEMM to accelerate TTFT up to 10x, speed-up increase with input length.
Make command:
python -c "import torch; print(torch.utils.cmake_prefix_path)"
output be like: /opt/venv/lib/python3.12/site-packages/torch/share/cmake
Then cmake -DCOMPUTE_BACKEND=cpu -DCMAKE_PREFIX_PATH=/opt/venv/lib/python3.12/site-packages/torch/share/cmake -S . && make

@jiqing-feng
Copy link
Owner Author

This kernel only relies on pytoch, which is definitely needed for BNB.

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@yao-matrix
Copy link

I don't think libtorch is a problem, the concern should be on ABI compatibility, which means you build in version x, but what happens when it runs w/ version y.

@jiqing-feng
Copy link
Owner Author

I don't think libtorch is a problem, the concern should be on ABI compatibility, which means you build in version x, but what happens when it runs w/ version y.

Yes, the BNB maintainer also raised this point, so he recommended that I put this implementation in kernel-community. We can pull kernels in BNB, it should fix the build and run in different versions issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants